Much has been said about cognitive technologies, which include artificial intelligence (AL), machine learning (ML), natural language processing (NLP), robotic process automation, and rule-based or expert systems. Much fear—of robot overlords taking over the world (and our brains)—has also been raised at the same time. Tad Friend’s article “How Frightened Should We Be of A.I.?” (The New Yorker, May 14, 2018), for instance, explored the fear factor in details. Then there is Kashyap Vyas’s “7 Ways AI Will Help Humanity, Not Harm It” (Interesting Engineering, December 3, 2018) providing a completely different line of thought.
For India’s digital solutions vendors, the application of AI and NLP has been ongoing for years, primarily to accelerate internal production processes and meet the publishing industry’s demands for faster, cheaper, and shorter turnaround time. From sifting through the large volume of incoming content to flagging content and process anomalies, AI/NLP has been indispensable.
A Boon to the Editorial Process
In STM publishing, especially, AI is being used effectively for editorial functions. “The functions range from building data sets that cover editorial and author preferences to writing and formatting styles,” says V. Bharathram, president of Lapiz Digital. “Writing meaningful alternate texts in accessibility solutions is another main area that has benefitted from AI. But the most important advantage of applying AI lies in minimizing human errors and negligence in the workflow.”
AI-based tools, Bharathram explains, can read and store the input-document details and start recording the changes made by the editor, with the changes recorded continuously with various input documents while editing. “Specific algorithms can also be included based on the rules for count and noncount nouns based on content, phrase, and clauses rules, punctuation rules, prepositions of time, subject-verb agreement, and so on. Subsequently, when the editor starts editing in a new document, the AI will automatically apply all these rules with the built-in algorithm in the document. It would then be sufficient for the editor to only do a random check at a later stage.”
With AI, the reference segment of a STM book or journal can be handled effectively using an online or in-house reference library. “AI can apply and highlight the keywords such as author name, product name, subject, and bibliographic for index process by referring to general master rules and the keyword library. This will reduce human errors and speed up the process,” Bharathram adds.
Intelligent automation of workflows applied at various points in the content life cycle can reduce time to publication, improve editorial quality, enhance author experience, and boost the immediacy of science, says Mike Groth, marketing director of Cenveo Publisher Services. “For example, some publishers are using AI and NLP in the peer review process to evaluate manuscripts, extract key terms for originality and relationship mapping, check for language quality, identify plagiarism, and even match papers to journals and reviewers.”
Tools within the Cenveo Smart Suite 2.0 publishing system, for instance, apply AI and NLP to track production and copyediting streams by quality, implement copy and formatting rules, facilitate self-service author proofing, and generate XML—which effectively reduce the number of touch points while flagging issues for human intervention when necessary.
Last year, Taylor & Francis went live with Cenveo’s Smart Suite, estimating that 25% of its article submissions are of high-enough quality to advance through automated checks straight to the composition stage. “It saves up to 40% in overall production time. AI and NLP, however, will not replace editors but rather empower them to add more value by focusing on the science and not on the process,” Groth adds, pointing out that “intelligent automation allows publishers to scale up without necessarily increasing headcount, sacrificing quality or increasing costs.”
Over at Lumina Datamatics, Big Data and predictive analytics have enabled clients to manage large volumes of structured and unstructured content. “Our range of analytical tools—including predictive modeling, data mining, scorecards, and machine learning—help publishers to automate their existing processes, resulting in increased efficiencies at a lower cost,” says Vidur Bhogilal, vice chairman of Lumina Datamatics.
Bhogilal and his team also use AI, NLP, and ML technologies to automate Lumina Datamatics’ copyediting processes. Then there is cognitive analysis and Smart Test technology for analyzing the quality and suitability of resources.
For Integra Software Services, the past four years have seen the company investing in AI and NLP. “Today, close to 80% of all our journal-content processing—from editorial services to final delivery—pass through various Integra products that are built on NLP and RPA [robotic process automation] frameworks,” says Sriram Subramanya, company founder and CEO, adding that AI and NLP “bring in improvements in productivity to tasks or functions where publishers face challenges in talent shortage, or high cost structures, or where the processes are tedious or have longer lead times resulting in lengthier publication cycles. These are most apparent in the acquisition stage, involving tasks such as manuscript screening, peer review, and copyediting.”
AI and NLP, Subramanya says, “help publishers predetermine the level of investments that are needed in copyediting a list, or in a specific discipline, by identifying a well-written article or chapter within a couple of minutes instead of days.” These technologies, he adds, improve the publisher’s in-house staff productivity, bring about faster times to market, offer better budget predictability, and reduce overall costs from acquisition to publishing.
Academic publishers have benefitted more from AI and NLP implementation than publishers in other segments, Subramanya says. “These technologies have great potential in improving the productivity of acquisition editors, peer review management staff, copyeditors, and the production department.”
As for Newgen KnowledgeWorks, the opportunity for authors and publishers to self-serve (by using various tools on offer) has seen its focus shifting increasingly toward supporting users, deploying tools, and training teams around the world on its solutions. “Increasingly, we will be creating our own content to support the impressive technologies—based on AL, ML, NLP, Big Data, or analytics, for instance—that we have developed,” company president Maran Elancheran says.
Newgen’s copyediting platform, CEGenius, for instance, is increasingly using AI to bring automation to content preparation and pre-editing routines. “Our tools are now able to largely automate the structuring and pre-editing stages, so that editorial staff are able to focus on value-added language editing,” Elancheran explains. “While automation is improving how we prepare, edit, and evaluate content, this is always a process guided by human intervention and understanding. The adage that the system is only ever as good as the information it contains is never more relevant than today. So it is vital that our editorial and technical staff understand the process, standards, and principles behind the rules our tools apply. Only then can we confidently deal with the exceptions that crop up from time to time.”
Machine learning is also helping the Newgen team to continuously improve and adapt its automated composition tools, whether in InDesign or LaTeX, to produce better-quality outputs with predictable consistency and speed. “Across all our project-management frameworks, from internal tools to My Own Book and PubKit, intelligent scheduling tools are helping us to manage workload and prioritize tasks according to the unique requirements and history of each piece of content,” Elancheran adds. “As the pressure on turnaround time increases, this ensures that the machine takes the strain in pushing content to the next stage of the workflow, be that an automated tool or a human for language editing or QA. Such tools are helping us to continuously drive down turnaround time while improving efficiency.”
For DiTech Process Solutions, AI-based automation is present in every facet of 3ClicksMaster, its cloud-based cross-media publishing platform. “We have incorporated ML/NLP-driven content modelling, confidence scoring–based decisions, and learning algorithms to train data for better models and more efficient automation,” says founder and CEO Nizam Ahmed. “AI provides us with the solutions and approaches to streamline the production processes to deliver the final product quickly and with better quality to the market.”
The digital solutions industry, Ahmed adds, “started from monolithic and assembly-line workflow (using manual processes), which then became technology-driven when a rule-based system was introduced to enhance the process. Today, it is a lean workflow that optimizes AI, NLP, machine learning, expert systems, and so much more. For DiTech, these new technologies and automation have enabled us to build 3ClicksMaster, which is a cost-efficient platform for high-quality STM books and journals, travel guides, schoolbooks, and backlist data conversion with 100% accuracy and much reduced turnaround time.”
TNQ Technologies, company CEO Abhigyan Arun says, has been using ML within its production processes for many years. “We use the extensive data repositories to train prediction engines that can complement its rule sets. We are now embarking on exposing this functionality into discrete services that can be used by our clients directly. These will be in the form of user-friendly APIs that are built on a scalable microservices-based architecture.”
Aiding Content Discoverability
Publishers can leverage AI in analytics and marketing to find the right audience for a title, or to help readers discover what they need. “Online book recommendation in many of the retailer sites currently uses some form of AI that makes guesses based on past purchases and browsing,” says Uday Majithia, assistant v-p of technology, services, and presales at Impelsys.
AI-based applications are being developed to assist authors and publishers. “AI applications, with minimal human intervention, can be used in tasks like text analysis, detecting plagiarism on new manuscripts, detecting false statistical results, finding new peer reviewers, content searches, and semantic searches,” Majithia adds. “AI can bring efficiency in the processes by reducing human intervention, which also reduces human bias. Then there are the possibilities of automating the entire peer review process. While humans cannot be entirely replaced by machines for peer reviewing, AI can be applied to speed up the process.”
AI/NLP can be deployed to benefit publishers in order to increase discoverability, user engagement (and conversions), workflow efficiency, and output quality as well as to optimize each publication’s marketing and sales strategies, says executive director and CEO Vinay Singh of Thomson Digital. “Intelligent machine learning algorithms can power auto-taggers to tag articles accurately and identify wrongly assigned tags. This makes it easier for publishers to implement and manage taxonomies,” Singh explains, adding that publishers can “leverage NLP to extract key phrases from various sections of the content and rank them based on their importance. This would ultimately improve content categorization and, by extension, content discovery.”
With NLP technology, publishers can automate simple editing and formatting tasks, and focus their energy on adding greater value to the content, Singh adds. “By using AI/NLP to score articles for quality, journal submissions that are high quality can advance straight to the typesetting and composition stage. And because editing is often the most time-consuming part of the production process, fast-tracking high-quality articles can save a significant amount of time for publishers while also improving the author experience.”
AI, Singh adds, can be used to develop algorithms, which will help researchers. “For instance, researchers can leverage the algorithms developed to find entities, trending topics, and even coauthors and reviewers in their area of specialization. Equally, machine learning can be used to extract sub-figures and captions from compound figures and even separate labels from their associated images. This will enable each item to be searched and retrieved individually. Besides, the automation will help publishers reduce the cost of segmentation, and extract and organize valuable information so that researchers can search, compare, and recommend images more precisely and easily.”
One of the key focus areas at Thomson Digital, Singh says, “is in robotic QA, in which content quality can be evaluated automatically with the algorithms put in place to make the machine capable enough to learn and improve on its own. This will ensure that all DTD rules are followed, and changes amended as well as verified.”
How to use Big Data to determine strategic decisions on the type of content to publish, ways to sell the content, and the appropriate marketing activities to boost revenues is one of the main opportunities for publishers, says Rahul Arora, CEO of MPS. “This is where AI comes in. By using ML, computers can consume data, analyze it, and recommend the next steps. Chatbots, for instance, have been used in many retail sites for better customer support, and publishers can use a similar approach as well. At MPS, we have been enhancing our platforms and systems to process Big Data across multiple publishing areas such as learning, editing, bookselling, and rights and permission.”
The past year has seen MPS working on solutions featuring chatbots for clients in the areas of onboarding, employee self-service, performance support, customer support, and game-based learning. Variable tracking, analytics, and multimedia displays are among the rich array of features developed for these solutions.
MPS’s Think360 delivery platform, for instance, uses AI and ML to recommend better book titles to readers. “Publishers use backlists to look at increasing their revenues, and AI can better help them access analytics to leverage the current buying trends in a market,” Arora adds, pointing out that “AI can also provide a more accurate approach for publishers by analyzing a learner’s requirements and understanding of concepts, thereby tailoring the content to suit the learner’s needs. In the online media space, AI is being used on the editorial and advertising side of the business, and many are using AI to create bite-sized content and news stories for millennials.”
Going Beyond Cost Savings
But there are issues in AI/NLP applications. “AI has been used in the past and evolved well,” says A.R.M. Gopinath, executive v-p at DiacriTech. “It is a natural progression to our software capabilities and has been gently eased into normal production processes. NLP, on the other hand, is definitely yet to take off effectively. There is a lot more learning needed on the systems; errors are not being captured as effectively yet. While we do work in NLP, complex content is not ready to be dealt with using NLP, and we are yet to bring a workable real-world situation live without human intervention. NLP with assisted copyediting is much more practical and reliable.”
Everybody is still eyeing cost reduction through automation and the use of AI. “The opportunities are really boundless, and these solutions are helping the industry face the challenges posed by changing revenue models and shifting market conditions,” says Elancheran, of Newgen KnowledgeWorks. “While this is an exciting time full of opportunities, the industry still has a very firm focus on author experience, quality, and service. Vendors that can bring all of these together—ensuring that the workforce has the right skill set to embrace new technologies and not employing technology indiscriminately—will thrive.”
AI/NLP should be primarily used to improve author experience and not necessarily to reduce costs, says Ravi Venkataramani, cofounder and CEO of Exeter Premedia Services. “In the long run, publishers that will do well are the ones that provide the best service for their authors. The focus of the industry should therefore be to enhance the author experience and streamline and improve the efficiency of the various workflows through AI, ML, and NLP.”
There is an artificial urgency with all publishers to use AI/ML technologies in their workflows, says Shanti Krishnamurthy, chief domain officer at TNQ Technologies. “It is unclear whether this desire is driven by the need to lower costs, or a deeper and essential need to generate more data and design data-driven workflows. The challenge is doing it at an enterprise level, factoring in that the legacy data was designed, collected, and stored for other purposes. So the ability of an organization to collect new and good-quality data and couple it with the legacy data is an important factor that requires a change in the mind-set at all levels of people in the organization.”
Some of the larger publishers have attempted to build in-house ML-based technology, adds Arun, of TNQ Technologies. “But the challenge of data availability, as well as the maturity of existing solutions from suppliers such as TNQ Technologies, did not justify the ROI on their internal effort. For smaller publishers, AI/ML is still a mystery. Our goal is to give them more awareness about these technologies and what to expect realistically so that they can make informed decisions about the use of AI/ML irrespective of whether the product is in-house or supplied by vendors.” (Arun and Mike Hepp, v-p of operations and technology of Sheridan Journal Services, presented a paper, “Demystifying Artificial Intelligence,” at the Society for Scholarly Publishing’s annual meeting in May.)
For Arora, of MPS, reliable code written efficiently and elegantly in a modular manner is not AI or NLP. Nor is the automation of processes using great code. “But tasks that are routine and can or have been automated are a great starting point for AI. Last year, we deployed AI in a project where the value chain was based on content complexity determined by our smart engine. We also integrated an NLP application as the decision-support system for copyediting for another customer.”
MPS, Arora says, has been using smart automation in its content transformation process across the author-to-reader value chain for some time now. “We have embarked on a journey of establishing cognitive production systems in the value chain. These systems, which are built on the principles of AI, ML, and NLP, offer solutions such as content profiling and cognitive QC to enable publishers to publish more, faster, and cheaper.”